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Warp architecture and implementation 

M. Annaratone, E. Arnould, T. Gross, H. T. Kung, M. S. Lam 

June 1986 ACM SIGARCH Computer Architecture News , Proceedings of the 13th 

annual international symposium on Computer architecture ISCA '86, volume 

14 Issue 2 

Publisher: IEEE Computer Society Press, ACM Press 

Additional Information: full citation , abstract , references, citings, index 



Full text available: 'jgpdf M .17 MB ) 



terms 



This paper describes the scan line array processor (SLAP), a new architecture designed for 
high-performance yet low-cost image computation. A SLAP is a SIMD linear array of 
processors, and hence is easy to build and scales well with VLSI technology; yet 
appropriate special features and programming techniques make it efficient for a 
surprisingly wide variety of low and medium level computer vision tasks. We describe the 
basic SLAP concept and some of its variants, discuss a particular planne ... 

The family of concurrent logic programming languages 
Ehud Shapiro 

September 1989 ACM Computing Surveys (CSUR), Volume 21 issue 3 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 
terms 



Full text available: I g) pdf( 9.62 MB) 



Concurrent logic languages are high-level programming languages for parallel and 
distributed systems that offer a wide range of both known and novel concurrent 
programming techniques. Being logic programming languages, they preserve many 
advantages of the abstract logic programming model, including the logical reading of 
programs and computations, the convenience of representing data structures with logical 
terms and manipulating them using unification, and the amenability to 
metaprogrammin ... 



Warp architecture and implementation 

Marco Annaratone, Emmanuel Arnould, Thomas Gross, H. T. Kung, Monica S. Lam, Onat 
Menzilcioglu, Ken Sarocky, Jon A. Webb 

August 1998 25 years of the international symposia on Computer architecture 
(selected papers) ISCA '98 

Publisher: ACM Press 

Full text available: Additional Information: 
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A decade of reconfi g urable com puting : a visionary retrospective 
R. Hartenstein 

March 2001 Proceedings of the conference on Design, automation and test in Europe 
DATE '01 ~ 

Publisher: IEEE Press 

Full text available: 'g) pdf( 768.0Q KB) Additional Information: full citation , references , citings, index terms 



Overview of a hi gh- performance pro g rammable pi peline structure 
Franc, ois Bodin, Francois Charot, Charles Wagner 

June 1989 Proceedings of the 3rd international conference on Supercomputing ICS 
'89 

Publisher: ACM Press 

Full text available- *Fl pdf(2 05 MB) Additional Information: full citation , abstract , references , citings, index 
™ : terms 

This paper aims at describing a high-performance programmable pipeline architecture 
consisting of a linear array of PCS processors. The PCS processor which is capable of 
performing 20 million floating-point operations per second (20 MFLOPS) has been built 
from off-the-shelf chips on a wire-wrapped board. The prototype processor is attached to 
a SUN-3 workstation. Efficient microcode is generated using the microcode compiler that 
has been designed and implemented. The microcode op ... 

Three-dimensional finite-element analyses: implications for computer architectures 
Valerie E. Taylor, Abhiram Ranade, David G. Messerschmitt 

August 1991 Proceedings of the 1991 ACM/IEEE conference on Supercomputing 
Supercomputing '91 

Publisher: ACM Press 

Full text available: t nC] pdff 849.04 KB ) Additional Information: full citation , references , index terms 



7 Evaluation of the Raw Microprocessor: An Exposed-Wire-Delav Architecture for ILP Q 
^ and Streams 

^ Michael Bedford Taylor, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt, Ben Greenwald, 
Henry Hoffmann, Paul Johnson, Jason Kim, James Psota, Arvind Saraf, Nathan Shnidman, 
Volker Strumpen, Matt Frank, Saman Amarasinghe, Anant Agarwal 
March 2004 ACM SIGARCH Computer Architecture News , Proceedings of the 31st 
annual international symposium on Computer architecture ZSCA '04, 

Volume 32 Issue 2 
Publisher: IEEE Computer Society, ACM Press 

Full text available: t g pdf(376.05 KB) Additional Information: full citation , abstract , citings 

This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a 
general-purpose architecture that performswell on a larger class of stream and embedded 
computing applicationsthan existing microprocessors, while still running existinglLP-based 
sequential programs with reasonable performance in theface of increasing wire delays. 
Raw approaches this challenge byimplementing plenty of on-chip resources - including 
logic, wires,and pins - in a tiled arrangement, and exposing the ... 

8 The white dwarf: a hi gh- performance a p plication-specific processor Q 
A. Wolfe, M. Breternitz, C. Stephens, A. L Ting, D. B. Kirk, R. P. Bianchini, J. P. Shen 
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^ May 1988 ACM SIGARCH Computer Architecture News , Proceedings of the 15th 
Ngr Annual International Symposium on Computer architecture ISCA '88, 

Volume 16 Issue 2 
Publisher: IEEE Computer Society. Press, ACM Press 

Full text available - "PI pdfd 40 MB) Additional Information: full citation , abstract , references , citings, index 
' ™ '~~ terms 

This paper presents the design and implementation of a high-performance special-purpose 
processor, called The White Dwarf, for accelerating finite element analysis algorithms. The 
White Dwarf CPU contains two Am29325 32-bit floating-point processors and one 
Am29332 32-bit ALU, and employs a wide-instruction word architecture in which the 
application algorithm is directly implemented in microcode. The entire system is VME-bus 
compatible and interfaces with a SUN 31160 host. The syste ... 

9 A hardware accelerator for maze routin g Q 
^ Y. Won, S. Sahni, Y. El-ziq 

>^ October 1987 Proceedings of the 24th ACM/IEEE conference on Design automation 
DAC '87 

Publisher: ACM Press 

Full text available: t g] pdf( 871 .73 KB ) Additional Information: full citation , abstract , references , index terms 

A hardware accelerator for the maze routing problem is developed. This accelerator 
consists of three 3 stage pipelines. Banked memory is used to avoid memory read/write 
conflicts and obtain maximum efficiency. 

10 Com puting multi-colored polygonal masks in p i peline architecture and its a p plication Q 
^ to automated visual inspection 

^ Jorge L. C Sanz, Its'hak Dinstein, Dragutin Petkovic 

April 1987 Communications of the ACM, volume 30 issue 4 
Publisher: ACM Press 

Full text available* "PI odf(2 56 MB) Additional Information: full citation , abstract , references , citing s, index 
u ex via .-gy.p__u terms 

New techniques for computing multicolored polygonal masks for image analysis and 
computer vision applications are presented. The procedures do not require random access 
of the image memory. They are based on efficient generation of coordinate-reference 
images (ramps) and other simple general purpose architectural features such as look-up 
tables. The techniques presented are, unlike their predecessors, highly parallel and can be 
efficiently implemented in existing pipeline image processors. ... 

11 A hi gh-s peed network interface for distributed-memor y s ystems: architecture and Q 
^ ap plications 

Peter Steenkiste 

February 1997 ACM Transactions on Computer Systems (TOCS), Volume 15 issue l 
Publisher: ACM Press 



Full text available: 'jgpdff 993.1 2 KB ) 



Additional Information: full citation , abstract , references , index terms . 



review 



Distributed-memory systems have traditionally had great difficulty performing network 
I/O at rates proportional to their computational power. The problem is that the network 
interface has to support network I/O for a supercomputer, using computational and 
memory bandwidth resources similar to those of a workstation. As a result, the network 
interface becomes a bottleneck. In this article we present an I/O architecture that 
addresses these problems and supports high-speed network I/O on dist ... 

Keywords: I/O architecture, application-managed I/O, data reshuffling, distributed 
memory systems, network interface, outboard buffering, protocol processing, resource 
management 
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12 Usin g Lookahead to reduce memory bank contention for decoupled operand 
references 

Peter L Bird, Richard A. Uhlig 

August 1991 Proceedings of the 1991 ACM/IEEE conference on Supercomputing 
Supercomputing '91 

Publisher: ACM Press 

Full text available: ^ pdfd.QQ MB ) Additional Information: full citation , references , citings, index terms 




13 Accelerators: Automatic ma pping of nested loops to FPGAS Q 

Uday Bondhugula, J. Ramanujam, P. Sadayappan 
V March 2007 Proceedings of the 12th ACM SIGPLAN symposium on Principles and 
practice of parallel programming PPoPP "07 

Publisher: ACM Press 

Full text available: ^| pdf(245.06 KB) Additional Information: full citation , abstract , references , index terms 

This paper present a framework for automatic mapping of perfectly nested loops with 
constant dependences onto regular processor arrays, suitable for direct implementation 
on Field Programmable Gate Arrays (FPGAs). The problem is modeled as that of finding a 
suitable completion procedure for a full-rank linear transformation on the iteration space. 
The approach enables extraction of necessary degrees of communication-free and 
pipelined parallelism to optimize performance under the resource con ... 

Keywords: FPGA, FPGA compilation, control signals, linear transformation, nested loops, 
regular processor arrays, resource constraints, scheduling 



14 GPMB — software p i pelinin g branch-intensive loo ps Q 
Zhihong Tang, Gang Chen, Chihong Zhang, Yingwei Zhang, Bogong Su, Stanley Habib 
December 1993 Proceedings of the 26th annual international symposium on 

Microarchitecture MICRO 26 
Publisher: IEEE Computer Society Press 

Full text available: pdf(906.47 KB ) Additional Information: full citation , references , citings 



Keywords: branch overlapping, branch-intensive loop-level parallelism, multi-branch 
switch, processing element 



15 Trident: a scalable architecture for scalar, vector, and matrix operations 
Mostafa I. Soliman, Stanislav G. Sedukhin 

January 2002 Australian Computer Science Communications , Proceedings of the 
seventh Asia-Pacific conference on Computer systems architecture 

CRPIT '02, Volume 24 Issue 3 

Publisher: Australian Computer Society, Inc., IEEE Computer Society Press 

Full text available: « pdf (814.51 KB ) Additional Information: full citation, abstract, references, citings, index 
' — ' terms 

Within a few years it will be possible to integrate a billion transistors on a single chip. At 
this integration level, we propose using a high level ISA to express parallelism to 
hardware instead of using a huge transistor budget to dynamically extract it. Since the 
fundamental data structures for a wide variety of applications are scalar, vector, and 
matrix, our proposed Trident processor extends the classical vector ISA with matrix 
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operations. The Trident processor consists of a set of paralle ... 

Keywords: data parallelism, parallel processing, ring register file, scalable hardware, 
vector/matrix processing 



16 Join processing in relat ional databases 
Priti Mishra, Margaret H. Eich 

March 1992 ACM Computing Surveys (CSUR), Volume 24 issue l 
Publisher: ACM Press 

Full text available- f 5 ! pdf (4 42 MB) Add| t'onal Information: full citation , abstract, references, citings, index 
™ terms , review 

The join operation is one of the fundamental relational database query operations. It 
facilitates the retrieval of information from two different relations based on a Cartesian 
product of the two relations. The join is one of the most diffidult operations to implement 
efficiently, as no predefined links between relations are required to exist (as they are with 
network and hierarchical systems). The join is the only relational algebra operation that 
allows the combining of related tuples fro ... 

Keywords: database machines, distributed processing, join, parallel processing, 
relational algebra 



17 Reconfi g urable computing: a survey of systems and software 
Katherine Compton, Scott Hauck 

June 2002 ACM Computing Surveys (CSUR), Volume 34 issue 2 
Publisher: ACM Press 

Full text available* "PI pdf(71 0 56 KB) Additional Information: full citation, abstract, references , citings, index 
• [A] ■ terms , review 

Due to its potential to greatly accelerate a wide variety of applications, reconfigurable 
computing has become a subject of a great deal of research. Its key feature is the ability 
to perform computations in hardware to increase performance, while retaining much of 
the flexibility of a software solution. In this survey, we explore the hardware aspects of 
reconfigurable computing machines, from single chip architectures to multi-chip systems, 
including internal structures and external coupling. W ... 

Keywords: Automatic design, FPGA, field-programmable, manual design, reconfigurable 
architectures, reconfigurable computing, reconfigurable systems 



18 A review of vessel extraction techniques and al g orithms 
^ Cemil Kirbas, Francis Quek 

V June 2004 ACM Computing Surveys (CSUR), volume 36 issue 2 
Publisher: ACM Press 

c „, . .. ,51, ^ f /oncK/iD\ Additional Information: full citation , abstract , references , citin gs, index 
Full text available: t^j pdf(8.Q6 MB) 

— terms 

Vessel segmentation algorithms are the critical components of circulatory blood vessel 
analysis systems. We present a survey of vessel extraction techniques and algorithms. We 
put the various vessel extraction approaches and techniques in perspective by means of a 
classification of the existing research. While we have mainly targeted the extraction of 
blood vessels, neurosvascular structure in particular, we have also reviewed some of the 
segmentation methods for the tubular objects that show ... 

Keywords: Magnetic resonance angiography, X-ray angiography, medical imaging, 
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19 S plash 2 

Jeffrey M. Arnold, Duncan A. Buell, Elaine G. Davis 

June 1992 Proceedings of the fourth annual ACM symposium on Parallel algorithms 
and architectures SPAA '92 

Publisher: ACM Press 

Full text available: *g)pdf (719.35 KB ) Additional Information: full citation , references , citin gs, index terms 



20 In put data reuse in compiling window operations onto reconfi q urable hardware 
^ Zhi Guo, Betul Buyukkurt, Walid Najjar 

June 2004 ACM SIGPLAN Notices , Proceedings of the 2004 ACM SIGPLAN/SIGBED 
conference on Languages, compilers, and tools for embedded systems 
LCTES '04, Volume 39 Issue 7 
Publisher: ACM Press 

Full text available* odff253 01 KB) Addit 'onal Information: full citation , abstract , references , citings, index 
• U-R—i s terms 

Balancing computation with I/O has been considered as a critical factor of the overall 
performance for embedded systems in general and reconfigurable computing systems in 
particular. Data I/O often dominates the overall computation performance for window 
operation, which are frequently used in image processing, image compression, pattern 
recognition and digital signal processing. This problem is more acute in reconfigurable 
systems since the compiler must generate the data path and the sequence ... 

Keywords: VHDL, compilation, high-level synthesis, reconfigurable computing, reuse 
analysis 
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